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ABSTRACT 

Motivation: Membrane proteins are an important class of biological 
macromolecules involved in many cellular key processes including 
signalling and transport. They account for one third of genes in the 
human genome and >50% of current drug targets. Despite their im- 
portance, experimental structural data are sparse, resulting in high 
expectations for computational modelling tools to help fill this gap. 
However, as many empirical methods have been trained on experi- 
mental structural data, which is biased towards soluble globular pro- 
teins, their accuracy for transmembrane proteins is often limited. 
Results: We developed a local model quality estimation method for 
membrane proteins ('QMEANBrane') by combining statistical poten- 
tials trained on membrane protein structures with a per-residue 
weighting scheme. The increasing number of available experimental 
membrane protein structures allowed us to train membrane-specific 
statistical potentials that approach statistical saturation. We show that 
reliable local quality estimation of membrane protein models is pos- 
sible, thereby extending local quality estimation to these biologically 
relevant molecules. 

Availability and implementation: Source code and datasets are 

available on request. 

Contact: torsten.schwede@unibas.ch 

Supplementary Information: Supplementary data are available at 
Bioinformatics online. 



1 INTRODUCTION 

Protein modelling plays a key role in exploring sequence struc- 
ture relationships when experimental data are missing. Modelling 
techniques using evolutionary information, in particular 
homology/comparative modelling, developed into standardized 
pipelines over recent years. An indispensable ingredient of such a 
pipeline is the accuracy estimation of a protein model, directly 
providing the user with information regarding the range of its 
possible applications (Baker and Sali, 2001; Schwede, 2013; 
Schwede el ah, 2009). In this context, global model quality as- 
sessment tools are important for selecting the best model among 
a set of alternatives, whereas local model estimates assess the 
plausibility and likely accuracy of individual amino acids 
(Benkert el al., 20 II; Fasnacht et al., 2007). Various techniques 
have been developed to address this question, with consensus 
methods and knowledge-based approaches showing best results 
in blind assessments (Kryshtafovych et ah, 2014). Consensus 
approaches require an ensemble of models with structural var- 
iety, reflecting alternative conformations (Roche el al, 2014; 
Skwark and Elofsson, 2013). 
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In contrast, knowledge-based methods (such as statistical po- 
tentials) can be apphed to single models but are in general less 
accurate than consensus methods and exhibit strong dependency 
on the structural data they have been trained on. 

The unique physicochemical properties of biological mem- 
branes give rise to interactions that are energetically discouraged 
in soluble proteins, and vice versa (White, 2009). However, most 
scoring functions using knowledge-based methods (Benkert 
el al, 2011; Luthy el al., mi; Ray el al, 2012; Sippl, 1993; 
Zhou and Zhou, 2002) have been trained on soluble proteins. 
Thus, they perform poorly when applied to models of membrane 
proteins. This specific, but highly relevant, important aspect of 
protein model quality assessment has received only little atten- 
tion in recent years (Heim and Li, 2012; Ray el al., 2010). With 
the growing amount of available high resolution membrane pro- 
tein structures (Garman, 2014; White, 2004) the template situ- 
ation for homology modelUng procedures is improving quickly 
and, even more important for this work, it is gradually becoming 
possible to adapt knowledge-based methods to this class of 
models. 

As a result of such efforts, we present QMEANBrane, a com- 
bination of statistical potentials targeted at local quality estima- 
tion of membrane protein models in their naturally occurring 
oligomeric state: after identifying the transmembrane region 
using an implicit solvation model, specifically trained statistical 
potentials get applied on the different regions of a protein model 
(Figs 1 and 2). To overcome statistical saturation problems, a 
novel approach for deriving statistical potentials from sparse 
training data has been devised. We have benchmarked the per- 
formance of the approach on a large heterogeneous test set of 
models and illustrate the result on the example of alignment 
errors in a transmembrane model. 



2 METHODS 

2.1 Target function 

The similarity/difference between a model and a reference structure can 
be expressed in the form of distances between corresponding atoms in the 
model and its reference structure after performing a global superposition. 
However, this global superposition approach fails to give accurate results 
in case of domain movements. To overcome such problems, e.g. in the 
context of the CASP (Moult et at., 2014) experiments, the structures are 
manually spUt into so-called assessment units and evaluated separately 
(Taylor et al, 2014). This manual procedure is time consuming and not 
suitable for automate large-scale evaluation, e.g. such as performed by 
CAMEO (Haas et al, 2013). Alternatively, similarity/difference between 
a model and reference structure can be expressed in the form of super- 
position-free measures such as the local Distance Difference Test (IDDT) 
score (Mariani et al, 2013) assessing the differences in interatomic 
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□euiation Between Calculated and Reference Membrane Axis (°) 




Deviation BebAieen Calculated and Reference Membrane Width (A) 



Fig. 1. Difference between membrane predictions of our algorithm and 
the predictions of OPM on the 200 high-resolution structures used to 
train membrane-specific statistical potentials 




Fig. 2. Local QMEANBrane scores mapped on the best performing 
model (mod9jk) regarding RMSD of the GPCR Dock experiment 
2008. Reference structure (2.6 A crystal structure of a human A2A ad- 
enosine receptor bound to ZM241385, PDB: 3eml) and membrane-defin- 
ing planes are shown in white 



distances between model and reference structure. In this work, the IDDT 
inclusion radius is set to lOA to ensure local behaviour. See 
Supplementary Figure S2 for a comparison of different structural simi- 
larity measures (Cff-distance, dRMSD, IDDT and CAD score; 
Olechnovic et al., 2013). 

2.2 Membrane segment definition 

The OPM database (Orientations of Proteins in Membranes; Lomize 
et al., 2006a) applies minimization of a free energy expression to predict 
the transmembrane part of a protein structure. In this work, we use a 



similar but simplified approach, still resulting in a robust approximation 
of the membrane segment definition. The energy expression is defined as 

AG = ^ .(T"'"^'"7(r,)^5^, (1) 

with cr"'"-**'' representing the transfer energy from water to decadiene for 
atom / per A" (Lomize et al., 2004), /{z,) the hydrophobicity as a function 
of the distance to the membrane centre z, and ASA, the accessible surface 
area of atom / in A^ as calculated with NACCESS (www.bioinf.manches 
ter.ac.uk/naccess). Not all surface-facing atoms, as determined by 
NACCESS are in contact with the membrane, even if they fall in between 
the lipid bilayer, e.g. as is the case for hydrophilic pores. To determine the 
subset of surface atoms in direct contact with the lipid bilayer, the protein 
structure surface as calculated by MSMS (Sanner et al., 1996) is placed 
onto a 3D grid, marking every cube in the grid containing surface ver- 
tices. The application of a flood fill algorithm (http://lodev.org/cgtutoi7 
floodfill.html) on every layer along the z-axis then allows the generation 
of a subset of potentially membrane facing atoms. 

The parameters describing the membrane (i.e. tilt angle relative to 
z-axis, rotation angle around z-axis, membrane width and distance of 
membrane centre to origin) first undergo a coarse grained sampling to 
identify the 10 best parameter sets for further refinement using a 
Levenberg-Marquardt minimizer. This procedure is repeated several 
times with different initial orientations of the structure to find the set 
of parameters leading to the lowest total free energy. 

The bilayer consists of a hydrocarbon core flanked by interface regions 
with a large chemical heterogeneity (White et al, 2001). It is known that 
the properties of a membrane protein are strongly influenced by the 
interaction with the phospholipid bilayer, and a simple split into a mem- 
brane and soluble part would not faithfully reflect the variation of mo- 
lecular properties along the membrane axis (Bernsel et al., 2008). To catch 
these variations along the membrane axis, we split the transmembrane 
proteins into three parts, which are treated separately: an interface part 
consisting of all residues with their Co; atom positions within 5A of the 
membrane defining planes, a core membrane part consisting of all resi- 
dues with their Ca atom positions in between the two membrane defining 
planes not intersecting with the interface residues and finally, a soluble 
protein part consisting of all remaining residues. 

2.3 Model quality predictors 

To assess the membrane protein models quality, we mainly rely on stat- 
istical potential terms, combined with the relative solvent accessibihty of 
each residue as calculated by DSSP (Kabsch and Sander, 1983). The four 
statistical potential terms (their exact parameterizations are described in 
the Supplementary Material), are the following: 

(1) All-atom Interaction Term: Pairwise interactions are considered 
between all chemically distinguishable heavy atoms. A sequence 
separation threshold has been introduced to allow focusing on 
long-range interactions and reduce the influence of local secondary 
structure. Interactions originating from atoms of residues closer in 
sequence than this threshold are neglected. 

(2) C/3 Interaction Term: This term assesses the overall fold by only 
considering pairwise interactions between C/3 positions of the 20 
standard amino acids. In case of glycine, a representative of the Cfi 
position gets constructed using the backbone as anchor. The same 
sequence separation as in the all-atom interaction is applied. 

(3) Solvation Term: Statistics are created by counting surrounding 
atoms around all chemically distinguishable heavy atoms not be- 
longing to the assessed residue itself. 

(4) Torsion Term: The central 0/i/f angles of three consecutive amino 
acids are assessed based on the identity of the involved amino acids 
using a grouping scheme described by Soils and Racliovsky (Soils 
and Rackovsky, 2006). 
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The torsion term trained on soluble structures is applied to the whole 
membrane protein model. Conversely, solvation and interaction terms are 
specifically trained for and applied to the soluble, membrane and inter- 
face segments with different potentials for a-helical and /i-barrel trans- 
membrane structures. A residue belonging to one of these parts 'interacts' 
with all atoms in the full model, and a final score is assigned by averaging 
all scores originating from interactions associated with this specific resi- 
due. For the solvation and torsion terms, we use a formalism closely 
related to the statistical potentials of mean force (Sippl, 1990). 
However, instead of referring to an energy expression, we rather look 
at the problem as a log odds score between the probability of observing a 
particular interaction between partner s with confomiation c relative to 
some reference state: 



S{c\.s)-- 



(2) 



In case of sparse data, p(c\s) cannot be expected to be saturated. Sippl 
and co-workers have proposed to use a combination of the extracted 
sequence-specific probability density function (pdf) and the reference 
state. The influence of the reference state vanishes at a rate determined 
by the newly introduced parameter a towards large numbers of inter- 
actions (N) with sequence s: 



\+Na' 



Using the aforementioned formalism, this leads to 



S(c\s) ^ In(l + Na) - !n 



\ Pic) ) 



(4) 



Because of the increased abundance of structural information for soluble 
protein structures during the last decades, the use of the a parameter has 
become largely unnecessary. However, for membrane proteins, data scar- 
city is still an issue and needs to be handled accordingly. In the 
Supplementary Materials, an analysis of the saturation behaviour of 
the different statistical potential terms is provided, suggesting a sufficient 
amount of training data for the solvation term, whereas the two inter- 
action terms require more data to be fully saturated (Supplementary 
Fig. SI). For these cases, we introduced a treatment for sparse data by 
assuming that the statistics for soluble proteins are fully saturated. In 
other words, if there are no sufficient data available from membrane 
structures, we refer to the information we have from all protein structures 
to get a hybrid score: 



HS(c\s) = -In 



1 



Na 



\+Na' 1 
ln( \+Na)- ln(/'i + 



^Na 
Nafi) 



(5) 



With /} representing the fraction of the probabilities of sequence-specific 
interactions and a reference state, where the pdfs of the specific inter- 
actions are saturated, and f2 the fraction between the probabilities of 
sequence-specific interactions and a reference state, where the pdfs of 
the specific interactions are not necessarily saturated, as it may occur 
for membrane- and interface-specific cases. 

For regions of the pdf with zero probability as they, for example, occur 
at low distances in pairwise interaction terms, we applied a constant cap 
value to avoid infinite scores. 



2.4 Training datasets for statistical potentials 

The pdfs to calculate the statistical potentials for the soluble part are built 
using statistics extracted from a non-redundant set of high resolution 
X-ray structures. PISCES (Wang and Dunbrack, 2003) has been used 
with the following parameters: sequence identity threshold 20%, reso- 
lution threshold 2 A and R-factor threshold 0.25. Because only standard 
amino acids can be handled by QMEANBrane, a prior curation of the 



training structures is necessary. Non-standard amino acids such as phos- 
pho-serine or seleno-methionine have therefore been mapped to their 
standard parent residues. For the selection of appropriate membrane 
protein structures, we rely on the OPM database (Lomize el al., 
2006b). As of October 2013, OPM contained 746 unique PDB IDs of 
structures with transmembrane segments. Applying a resolution thresh- 
old of 2.5 A, removing all chains with <30 membrane-associated residues 
and considering only one chain in case of homo-ohgomers results in 283 
remaining chains from 200 structures. Clustering the chains based on 
their SEQRES sequences with kClust (Hauser et al., 2013) using a se- 
quence identity threshold of 30% resulted in 187 clusters, 140 of them 
from helical transmembrane structures and 47 from /i-barrel structures. 
All entries are used in the calculation of the pdfs, where a chain originat- 
ing from a cluster with n members is downweighted and contributes with 
a weight of 1/n to the final distributions. These final distributions have 
then been extracted by considering the corresponding chains, using the 
full protein structure in the oligomeric state as assigned by OPM as 
environment. 

2.5 Datasets for training linear combinations 

A set of 3745 models for soluble proteins was generated by selecting a set 
of non-redundant high-resolution reference structures from the PDB 
using PISCES (maximum 20% sequence identity, resolution better 2A, 
X-ray only), extracting their amino acid sequences, and building models 
using the automated SWISS-MODEL pipeline (Kiefer et al., 2009) by 
excluding templates with a sequence identity >90% to the target 
(P. Benkert, personal communication). OPM was used to identify refer- 
ence structures (resolution <3.0 A) to generate membrane protein models. 
Structures with <30 membrane-associated residues and hetero-oligomeric 
complexes were excluded. In all, 132 unique PDB IDs, which had more 
than one suitable template, have been selected as targets for modelling. 
Templates identified with HHBlits (Remmert et al., 2012) showing a se- 
quence alignment coverage >50% served as input for MODELLER (Sali 
and Blundell, 1993) and resulted in 3226 models with oligomeric states 
equivalent to the template structure. Removal of redundancy, i.e. models 
originating from templates with same sequence, and removal of obvious 
incorrect oligomeric states upon visual inspection resulted in a set of 557 
models, 386 with helical transmembrane parts and 171 /J-barrels. 

2.6 Spherical smoothing for noise reduction 

Averaging/smoothing can reduce noise introduced by quality predictors 
on a per-residue level, resulting in single residue scores, which more ac- 
curately reflect the local model quality. Smoothing in space tends to 
outperform sequential smoothing. In the proposed algorithm, every resi- 
due gets represented by its Ca position. The final quality predictor score 
for a residue is calculated as a weighted mean of its own value and the 
values associated to surrounding residues: 



(6) 



with .s, representing the final score at position /, 117 the weight of score 
at position j and s, the score at position /. The weights are calculated in a 
Gaussian-like manner and normalized, so they sum up to one: 



(I)" 



0 



dij < 3a 
else 



(7) 



with 117 representing the weight of score at position /, d/j the distance 
from position / to position /, a the standard deviation of the Gaussian-like 
formalism to control how fast the influence of a neighbouring score 
vanishes as a function of the distance (5 A turned out to be a reasonable 
a) and Z as normalization factor. 
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2.7 Per amino acid weighting scheme 

QMEANBrane uses a linear model fitted on the per-residue IDDT score 
to combine the single quality predictors. To remove amino acid-specific 
biases, such a linear model is trained for every standard amino acid: 

Si is the combined score of residue at position /, 117 the weight of quality 
predictor j and 5,y the score of quality predictor / at position /. 

2.8 Implementation 

QMEANBrane is designed on a modular basis, implementing computa- 
tionally expensive tasks in a C++ layer. All functionality is made fully 
accessible from the Python language and can directly be embedded into 
the computational structural biology framework OpenStructure (Biasini 
et al., 2010, 2013), allowing to assemble custom assessment pipelines to 
address more specific requirements. 

3 RESULTS AND DISCUSSION 

3.1 Membrane prediction accuracy 

To evaluate the performance of our membrane finding algo- 
rithm, a comparison with the result obtained by OPM has 
been perfomied on the 200 structures used for training of the 
membrane-specific statistical potentials. At this point, OPM is 
assumed to be the gold standard, even though it is a calculation 
by itself By further considering the membrane width as the main 
feature of accuracy, 95% of the absolute width deviations are 
<4A. In terms of translational distances, this corresponds to a 
'misprediction' of 2-3 residues for helices and about 1-2 residue 
for sheets (Fig. 1). Interestingly, using this approach, it is not 
only possible to automatically detect transmembrane regions but 
also to distinguish between transmembrane and soluble struc- 
tures in general (Supplementary Fig. S3). 

3.2 Performance on the test dataset 

For a first analysis of performance on predicting local scores of 
membrane-associated residues in transmembrane protein 
models, we used the previously described model set for training 
the linear weights. Clusters have been built by applying kClust 
on the target sequences with a sequence identity threshold of 
30%. The local scores for the membrane-associated residues of 
one cluster have then been predicted using linear models trained 
on all residues from models not belonging to that particular 
cluster (Table 1 ; Supplementary Fig. S6). 

3.3 Independent performance evaluation on models of the 
GPCR Dock experiments 

Not many independent compilations of membrane protein 
models with known target structures exist. For a performance 
evaluation and comparison with other widely used quality assess- 
ment tools, we rely on the models generated during the GPCR 
Dock experiments 2008/2010 (Kufareva et a/., 2011; Michino 
et al., 2009) (Fig. 2). A total of 491 models for three different 
targets, the human dopamine receptor, the human adenosine 
receptor and the human chemokine receptor were available. 
Receiver operating characteristic (ROC) analysis with the local 
IDDT as target value has been performed on all membrane- 



Table 1. Performances of single quality predictors and their combination 
on membrane-associated residues in our test set, measured as Pearsons' r 
between predicted score and actual local IDDT 



Quality predictor 


Helical structures 


/i-barrel structures 


Exposed 


0.39 


0.15 


Torsion 


0.43 


0.47 


C/3 interaction 


0.51 


0.49 


Solvation 


0.55 


0.51 


All atom interaction 


0.63 


0.58 


All predictors combined 


0.71 


0.67 



Note: Even for single predictors, an amino acid-specific linear model has been 
trained to remove amino acid-specific biases. 




False Positive Rate 



Fig. 3. ROC analysis of all membrane-associated residues of the models 
of the GPCR Dock experiments with local IDDT as target value and a 
class cutoff of 0.6 



associated residues as defined by OPM, showing a clear super- 
iority of QMEANBrane over other methods such as ProQ2 (Ray 
et al., 2012), QMEAN (Benkert et al., 201 1), ProQM (Ray et al., 
2010), Prosa ("Wiederstein and Sippl, 2007), Verify3D (Luthy 
et al., 1992) or DFire (Zhou and Zhou, 2002) (Fig. 3). 
Removing all GPCR/Rhodopsin structures from the training 
data has only a minor effect. See Supplementary Figure S4 for 
a more detailed performance analysis taking other measures of 
similarity into account. Because ProQM is the only other method 
specifically developed for the particular case of membrane 
protein model quality assessment, we also performed a direct 
comparison of QMEANBrane and ProQM on the dataset used 
to test/train ProQM in Supplementary Figure S5. 

3.4 Retrospective analysis of modelling examples 

To illustrate the usefulness of QMEANBrane in tackling prob- 
lems as they occur in real modelling cases, two targets with 
known structures have been selected for a more detailed analysis 
using the recently released SWISS-MODEL workspace (Biasini 
et al., 2014). The translocating pyrophosphatase from Vigna 
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Fig. 4. Difference of QMEANBrane scores of three dopamine trans- 
porter models with modified alignments versus the model built with the 
initial HHBlits alignment, represented by the first horizontal bar. 
Insertions are marked black, and deletions are marked white. Second 
bar: shift of the insertion towards the N-terminus in front of helix 4, 
third bar: shift of insertion towards the N-terminus in between helices 4 
and 5, fourth bar: shift of the insertion towards the C-terminus 



radiata (PDB ID: 4A01) and a dopamine transporter of 
Drosophila melanogaster (PDB ID: 4M48). Models based on 
different target-template alignments have been compared to 
test QMEANBrane's capability of detecting incorrect align- 
ments, particularly alignment shifts in transmembrane helices. 
(Alignments are available in the Supplementary Materials.) 

The pyrophosphatase has, with the sodium translocating 
pyrophosphatase from Thermotoga maritima (PDB ID: 4AV3), 
a rather close homologue (sequence identity >40%). 
Nevertheless, the alignments provided by BLAST (Altschul 
et al., 1990) and HHBlits differ significantly. Because the 
BLAST alignment has a lower coverage, not including the first 
transmembrane heUx, only the part covered by both alignments 
is considered. Supplementary Figure S7 shows a comparison of 
the QMEANBrane scores from the two models built with the 
different alignments. Two transmembrane helices contain an 
alignment shift of three residues, resulting in a clear local increase 
of the QMEANBrane scores of the model built with the HHBlits 
alignment relative to the model built with the BLAST ahgnment. 
The higher quality of the HHBhts model gets confirmed by its 
global IDDT of 0.63 versus 0.59 of the BLAST rnodel. 

For the dopamine transporter example, we chose an amine 
transporter from Aquifex aeolicus VF5, identified by HHBlits 
with a sequence identity of ~24%, as the primary template. 
Despite the good coverage, a major problem occurs in transmem- 
brane heUx 5. The initial HHBlits alignment has an insertion of 
three residues enforcing a helix break and an unnatural bulge 
within the transmembrane part. To analyse possible modifica- 
tions of the initial alignment, we rely on QMEANBrane to 
compare the relative differences in the models with alternative 
alignments with the initial model (Figs 4 and 5). 

Three different alternative alignments were considered: the 
first is to sliift the helix insertions towards the C-terminus. 
Despite the increase of the QMEANBrane score at the location 




Fig. 5. Structural effects of the alignment modifications shown in 
Figure 4. The model based on the initial HHBlits alignment is coloured 
white; the other models are coloured according to the horizontal bar 
alignment representation in Figure 4 



of the alignment modification, the scores in helix 5 towards the 
C-terminus drop significantly, suggesting no improvement of the 
overall model quality. As second alternative, the insertion has 
been shifted into the loop connecting transmembrane hehces 4 
and 5. Because of their proximity, a distortion of both involved 
helix endings was inevitable, thus unfavourable. The third alter- 
native, shift of the insertion towards the N-terminus before helix 
4, and introducing an additional deletion in the aforementioned 
loop increasing the local sequence identity in helix 4, consistently 
increases the QMEANBrane scores in hehces 4 and 5, as well as 
the helices close in space. These findings are confirmed by the 
global IDDT scores of the models built based on those align- 
ments (initial alignment: 0.54, shift into middle: 0.54, shift 
towards C-terminus: 0.53, shift towards N-terminus: 0.57). 



4 CONCLUSION 

Investigating function and interactions in membrane proteins is 
an active field of research, with modelling techniques as an im- 
portant tool to bridge the gap when structural data are missing. 
Comparative modelling methods automatically profit from 
the increased number of available experimental membrane struc- 
tures, which can be used to build models for membrane 
proteins (Forrest et ah, 2006). However, most knowledge-based 
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approaches fail in assigning reliable local quality estimates when 
confronted with the unique structural features and interactions 
resulting from direct contact with the phospholipid bilayer. 

With QMEANBrane, we present a framework that widely 
covers the aspects of membrane protein model quality assess- 
ment. In a first step, our membrane detection method allows 
to reliably locate the transmembrane part of the model. We 
introduce an interface region to account for the non-isotropy 
of protein properties along the z-axis. Statistical potential 
terms were trained specifically for these three regions, introdu- 
cing a new hybrid potential formalism to circumvent problems 
arising from a lack of sufficient training data. The final local 
scores are then calculated using linear models trained for all 20 
standard amino acids. We could show a clear improvement in 
accuracy over widely used quality assessment methods when con- 
sidering alpha-hehcal transmembrane structures. It is possible to 
detect errors introduced in the modelling procedure such as in- 
correct alignments, which would facilitate the visual exploration 
of alternative alignments, e.g. as suggested previously in 
(Barbato et al., 2012). 

Despite similar observed overall performance for /i-barrel 
structures, problems arise with shifted ahgnments, as they can 
occur when aligning sequences from remote homologues. The 
low number of pairwise atomic interactions in combination 
with the regular hydrophobicity pattern often observed in align- 
ment shifts by two residues hamper the reliable detection of such 
errors and require further investigation. 
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